14. [Preview] Project: Bookworm

Overview

In this project, you will use IBM Watson's NLP Services to create a simple question-answering system. You will first use the Discovery service to pre-process a document collection and extract relevant information. Then you will use the Conversation service to build a natural language interface that can respond to questions.


Getting Started

Clone this repository to your local computer:

https://github.com/udacity/AIND-NLP-Bookworm

If you have the AIND Anaconda environment prepared, now is a good time to activate it.

Open the notebook bookworm.ipynb from a terminal using the following command:

jupyter notebook bookworm.ipynb

Then follow the instructions in the notebook.

Note: You may have to install some packages (mentioned in the notebook). To do so, simply open another terminal and use pip.


Tasks

Complete each task in the notebook by implementing or modifying code wherever there is a TODO comment in a code cell, and answering any inline questions by modifying markdown cells. E.g.:

Q: What is the overall sentiment detected in this text? Mention the type (positive/negative) and score.

A: Negative, -0.798

Once you have completed all tasks, save the notebook, and then export it into a PDF or HTML. Remember to submit both the notebook (.ipynb) and the PDF/HTML, along with any other files that may be needed, e.g. data files, in case you use your own (sample files provided with the project don't need to be submitted).

Note: Please do not submit your service-credentials.json file - that is meant to be kept secret.


Submission

Note: These instructions are being provided as part of the preview. But you will have to wait for the concentrations to open up before submitting your project for review.

Upload your completed project as a .zip archive or link to a GitHub repo containing the following files:

  • bookworm.ipynb: Jupyter notebook with all code cells completed, showing output, no errors, and all questions answered.
  • bookworm.pdf or bookworm.html: A PDF or HTML export of the notebook.

Your project will be evaluated on this rubric:

1. Create and configure Discovery service

  • Prepare an environment: Notebook connects to a Discovery service instance and creates/fetches an environment.
  • Test configuration: The service processes a small sample text and returns enriched output.
  • Analyze test output: All inline questions are answered correctly based on the output, and a word-cloud of keywords is shown.

2. Ingest documents

  • Prepare a document collection: A collection is created and documents are added to it, one document per paragraph of text.
  • Test query: A simple query is made against the collection, and relevant results are returned. Inline questions are answered correctly.

3. Parse natural language questions

  • Add intents: At least 3 intents are added to the Conversation service, each with at least 5 example utterances. Inline questions are answered adequately.

  • Add entities: At least 3 entities are added to the Conversation service, each with at least 1 example entity. Inline questions are answered adequately.

  • Design dialog flow: An appropriate dialog flow has been designed using the Conversation service workspace tool, with at least 3 nodes. Inline questions are answered adequately.

  • Test dialog: A simple 1-question dialog is demonstrated in the notebook, showing what node was triggered.

4. Query document collection to fetch answers

  • Process sample question: A sample question is run through the Conversation service. The intent and entities identified are extracted, and optionally the dialog node that was triggered.

  • Query the collection: A query is designed based on the information extracted in the previous step, and run against the Discovery service collection.

  • Process returned results: Results obtained from the Discovery service are processed to provide a specific response to the natural language question that was asked.

5. Reflections

  • Reflections: Inline question adequately answered, including strengths and weaknesses of an API-based solution like this.

(Optional) Extensions

  • Try with a different dataset (submit data files along with the notebook, or include instructions on how to fetch them).
  • Deploy as a Bluemix application.